AITopics | different stage

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Neural Information Processing SystemsJun-12-2026, 09:17:38 GMT

Recently, mobile manipulation has attracted increasing attention for enabling language-conditioned robotic control in household tasks. However, existing methods still face challenges in coordinating mobile base and manipulator, primarily due to two limitations. On the one hand, they fail to explicitly model the influence of the mobile base on manipulator control, which easily leads to error accumulation under high degrees of freedom. On the other hand, they treat the entire mobile manipulation process with the same visual observation modality (e.g., either all 2D or all 3D), overlooking the distinct multimodal perception requirements at different stages during mobile manipulation. To address this, we propose the Adaptive Coordination Diffusion Transformer (AC-DiT), which enhances mobile base and manipulator coordination for end-to-end mobile manipulation.

artificial intelligence, mobile manipulation, proceedings, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.58)

Add feedback

How Control Information Influences Multilingual Text Image Generation and Editing?

Neural Information Processing SystemsMar-18-2026, 01:12:26 GMT

Visual text generation has significantly advanced through diffusion models aimed at producing images with readable and realistic text. Recent works primarily use a ControlNet-based framework, employing standard font text images to control diffusion models. Recognizing the critical role of control information in generating high-quality text, we investigate its influence from three perspectives: input encoding, role at different stages, and output features. Our findings reveal that: 1) Input control information has unique characteristics compared to conventional inputs like Canny edges and depth maps.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs

Neural Information Processing SystemsDec-24-2025, 17:13:32 GMT

Piano transcription systems are typically optimized to estimate pitch activity at each frame of audio. They are often followed by carefully designed heuristics and post-processing algorithms to estimate note events from the frame-level predictions. Recent methods have also framed piano transcription as a multi-task learning problem, where the activation of different stages of a note event are estimated independently. These practices are not well aligned with the desired outcome of the task, which is the specification of note intervals as holistic events, rather than the aggregation of disjoint observations. In this work, we propose a novel formulation of piano transcription, which is optimized to directly predict note events. Our method is based on Semi-Markov Conditional Random Fields (semi-CRF), which produce scores for intervals rather than individual frames. When formulating piano transcription in this way, we eliminate the need to rely on disjoint frame-level estimates for different stages of a note event. We conduct experiments on the MAESTRO dataset and demonstrate that the proposed model surpasses the current state-of-the-art for piano transcription. Our results suggest that the semi-CRF output layer, while still quadratic in complexity, is a simple, fast and well-performing solution for event-based prediction, and may lead to similar success in other areas which currently rely on frame-level estimates.

event-based piano transcription, frame-level, name change, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adaptive Multi-stage Density Ratio Estimation for Learning Latent Space Energy-based Model

Neural Information Processing SystemsDec-24-2025, 16:52:49 GMT

This paper studies the fundamental problem of learning energy-based model (EBM) in the latent space of the generator model. Learning such prior model typically requires running costly Markov Chain Monte Carlo (MCMC). Instead, we propose to use noise contrastive estimation (NCE) to discriminatively learn the EBM through density ratio estimation between the latent prior density and latent posterior density. However, the NCE typically fails to accurately estimate such density ratio given large gap between two densities. To effectively tackle this issue and further learn more expressive prior model, we develop the adaptive multi-stage density ratio estimation which breaks the estimation into multiple stages and learn different stages of density ratio sequentially and adaptively. The latent prior model can be gradually learned using ratio estimated in previous stage so that the final latent space EBM prior can be naturally formed by product of ratios in different stages. The proposed method enables informative and much sharper prior than existing baselines, and can be trained efficiently. Our experiments demonstrate strong performances in terms of image generation and reconstruction as well as anomaly detection.

artificial intelligence, data mining, machine learning, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.60)

Add feedback

Why it is worth making an effort with GenAI

Rogers, Yvonne

arXiv.org Artificial IntelligenceSep-3-2025

Students routinely use ChatGPT and the like now to help them with their homework, such as writing an essay. It takes less effort to complete and is easier to do than by hand. It can even produce as good if not better output than the student's own work. However, there is a growing concern that over-reliance on using GenAI in this way will stifle the development of learning writing and critical thinking skills. How might this trend be reversed? What if students were required to make more effort when using GenAI to do their homework? It might be more challenging, but the additional effort involved could result in them learning more and having a greater sense of achievement. This tension can be viewed as a form of effort paradox; where effort is both viewed as something to be avoided but at the same time is valued. Is it possible to let students learn sometimes with less and other times more effort? Students are already adept at the former but what about the latter? Could we design new kinds of AI tools that deliberately require more effort to use to deepen the learning experience? In this paper, I begin to outline what form these might take, for example, asking students to use a combination of GenAI tools with traditional learning approaches (e.g. note-taking while reading). I also discuss how else to design tools to think with that augments human cognition; where students learn more the skills of metacognition and reflection.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.56734/ijahss.v6nSa1

2509.00852

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)
(2 more...)

Add feedback

A Start To End Machine Learning Approach To Maximize Scientific Throughput From The LCLS-II-HE

Mishra, Aashwin, Seaberg, Matt, Roussel, Ryan, Poitevin, Fred, Thayer, Jana, Ratner, Daniel, Edelen, Auralee, Mehta, Apurva

arXiv.org Artificial IntelligenceJun-2-2025

With the increasing brightness of Light sources, including the Diffraction-Limited brightness upgrade of APS and the high-repetition-rate upgrade of LCLS, the proposed experiments therein are becoming increasingly complex. For instance, experiments at LCLS-II-HE will require the X-ray beam to be within a fraction of a micron in diameter, with pointing stability of a few nanoradians, at the end of a kilometer-long electron accelerator, a hundred-meter-long undulator section, and tens of meters long X-ray optics. This enhancement of brightness will increase the data production rate to rival the largest data generators in the world. Without real-time active feedback control and an optimized pipeline to transform measurements to scientific information and insights, researchers will drown in a deluge of mostly useless data, and fail to extract the highly sophisticated insights that the recent brightness upgrades promise. In this article, we outline the strategy we are developing at SLAC to implement Machine Learning driven optimization, automation and real-time knowledge extraction from the electron-injector at the start of the electron accelerator, to the multidimensional X-ray optical systems, and till the experimental endstations and the high readout rate, multi-megapixel detectors at LCLS to deliver the design performance to the users. This is illustrated via examples from Accelerator, Optics and End User applications.

artificial intelligence, experiment, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.23858

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Industry:

Energy (1.00)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

How Control Information Influences Multilingual Text Image Generation and Editing?

Neural Information Processing SystemsMay-26-2025, 16:03:41 GMT

Visual text generation has significantly advanced through diffusion models aimed at producing images with readable and realistic text. Recent works primarily use a ControlNet-based framework, employing standard font text images to control diffusion models. Recognizing the critical role of control information in generating high-quality text, we investigate its influence from three perspectives: input encoding, role at different stages, and output features. Our findings reveal that: 1) Input control information has unique characteristics compared to conventional inputs like Canny edges and depth maps. Based on these insights, we propose TextGen, a novel framework designed to enhance generation quality by optimizing control information.

artificial intelligence, control information, machine learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Learning Cascade Ranking as One Network

Wang, Yunli, Zhang, Zhen, Wang, Zhiqiang, Yang, Zixuan, Li, Yu, Yang, Jian, Wen, Shiyang, Jiang, Peng, Gai, Kun

arXiv.org Artificial IntelligenceMar-12-2025

Cascade Ranking is a prevalent architecture in large-scale top-k selection systems like recommendation and advertising platforms. Traditional training methods focus on single-stage optimization, neglecting interactions between stages. Recent advances such as RankFlow and FS-LTR have introduced interaction-aware training paradigms but still struggle to 1) align training objectives with the goal of the entire cascade ranking (i.e., end-to-end recall) and 2) learn effective collaboration patterns for different stages. To address these challenges, we propose LCRON, which introduces a novel surrogate loss function derived from the lower bound probability that ground truth items are selected by cascade ranking, ensuring alignment with the overall objective of the system. According to the properties of the derived bound, we further design an auxiliary loss for each stage to drive the reduction of this bound, leading to a more robust and effective top-k selection. LCRON enables end-to-end training of the entire cascade ranking system as a unified network. Experimental results demonstrate that LCRON achieves significant improvement over existing methods on public benchmarks and industrial applications, addressing key limitations in cascade ranking training and significantly enhancing system performance.

cascade ranking system, experiment, lcron, (13 more...)

arXiv.org Artificial Intelligence

2503.09492

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Shandong Province > Dongying (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reviews: Scalable Bayesian inference of dendritic voltage via spatiotemporal recurrent state space models

Neural Information Processing SystemsJan-26-2025, 21:15:57 GMT

Minor comments: - The panel labels have inconsistent styles. Some are just the label with a period, some have parentheses with a period, some have parentheses without a period.

dendritic voltage, scalable bayesian inference, spatiotemporal recurrent state space model, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)

Add feedback

Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs

Neural Information Processing SystemsJan-18-2025, 15:26:42 GMT

Piano transcription systems are typically optimized to estimate pitch activity at each frame of audio. They are often followed by carefully designed heuristics and post-processing algorithms to estimate note events from the frame-level predictions. Recent methods have also framed piano transcription as a multi-task learning problem, where the activation of different stages of a note event are estimated independently. These practices are not well aligned with the desired outcome of the task, which is the specification of note intervals as holistic events, rather than the aggregation of disjoint observations. In this work, we propose a novel formulation of piano transcription, which is optimized to directly predict note events.

event-based piano transcription, neural semi-crf, note event, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

different stage

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

How Control Information Influences Multilingual Text Image Generation and Editing?

Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs

Adaptive Multi-stage Density Ratio Estimation for Learning Latent Space Energy-based Model

Why it is worth making an effort with GenAI

A Start To End Machine Learning Approach To Maximize Scientific Throughput From The LCLS-II-HE

How Control Information Influences Multilingual Text Image Generation and Editing?

Learning Cascade Ranking as One Network

Reviews: Scalable Bayesian inference of dendritic voltage via spatiotemporal recurrent state space models

Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs